CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups
نویسندگان
چکیده
Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres (NEWS, WIKINEWS, and WIKIPEDIA) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types.
منابع مشابه
Task Difficulty and Its Components: Are They Alike or Different across Different Macro-genres?
Task difficulty across different macro-genres continues to remain among less attended areas in second language development studies. This study examined the correlation between task difficulty across the descriptive, narrative, argumentative, and expository macro-genres. The three components of task difficulty (i.e., code complexity, cognitive complexity, and communicative stress) were also comp...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملACADEMIC WRITING REVISITED: A PHRASEOLOGICAL ANALYSIS OF APPLIED LINGUISTICS HIGH-STAKE GENRES FROM THE PERSPECTIVE OF LEXICAL BUNDLES
Lexical bundles are frequent word combinations that commonly appear in different registers. They have been the subject of much research in the area of corpus linguistics during the last decade. While most previous studies of bundles have mainly focused on variations in the use of these word combinations across different registers and a number of disciplines, not much research has been done to e...
متن کاملAssessing Reading Comprehension of Expository Text across Different Response Formats
This study investigated if different response formats (test methods) measure reading comprehension of expository text differently. The study was conducted with 48 semester 6 TESL students at a university in Selangor, Malaysia. These students received an expository passage having descriptive rhetorical structure followed by three response formats, namely, incomplete outline, graphic organizer, a...
متن کاملPublished vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017